75 research outputs found
Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018
This paper describes FBK's submission to the end-to-end English-German speech
translation task at IWSLT 2018. Our system relies on a state-of-the-art model
based on LSTMs and CNNs, where the CNNs are used to reduce the temporal
dimension of the audio input, which is in general much higher than machine
translation input. Our model was trained only on the audio-to-text parallel
data released for the task, and fine-tuned on cleaned subsets of the original
training corpus. The addition of weight normalization and label smoothing
improved the baseline system by 1.0 BLEU point on our validation set. The final
submission also featured checkpoint averaging within a training run and
ensemble decoding of models trained during multiple runs. On test data, our
best single model obtained a BLEU score of 9.7, while the ensemble obtained a
BLEU score of 10.24.Comment: 6 pages, 2 figures, system description at the 15th International
Workshop on Spoken Language Translation (IWSLT) 201
The ITC-irst statistical machine translation system for IWSLT-2004
Focus of this paper is the system for statistical machine translation developed at ITC-irst. It has been employed in the evaluation campaign of the International Workshop on Spoken Language Translation 2004 in all the three data set conditions of the Chinese-English track. Both the statistical model underlying the system and the system architecture are presented. Moreover, details are given on how the submitted runs have been produced. 1
Enhancing Transformer for End-to-end Speech-to-Text Translation
Neural end-to-end architectures have beenrecently proposed for spoken languagetranslation (SLT), following the state-of-the-art results obtained in machine translation (MT) and speech recognition (ASR).Motivated by this contiguity, we proposean SLT adaptation of Transformer (thestate-of-the-art architecture in MT), whichexploits the integration of ASR solutionsto cope with long input sequences featuring low information density. Long audiorepresentations hinder the training of largemodels due to Transformer’s quadraticmemory complexity.Moreover, for thesake of translation quality, handling suchsequences requires capturing both short-and long-range dependencies between bi-dimensional features. Focusing on Trans-former’s encoder, our adaptation is basedon:i)downsampling the input with con-volutional neural networks, which enablesmodel training on non cutting-edge GPUs,ii)modeling the bidimensional nature ofthe audio spectrogram with 2D components, andiii)adding a distance penaltyto the attention, which is able to bias ittowards short-range dependencies.Ourexperiments show that our SLT-adaptedTransformer outperforms the RNN-basedbaseline both in translation quality andtraining time, setting the state-of-the-artperformance on six language directions
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus
Translating from languages without productive grammatical gender like English
into gender-marked languages is a well-known difficulty for machines. This
difficulty is also due to the fact that the training data on which models are
built typically reflect the asymmetries of natural languages, gender bias
included. Exclusively fed with textual data, machine translation is
intrinsically constrained by the fact that the input sentence does not always
contain clues about the gender identity of the referred human entities. But
what happens with speech translation, where the input is an audio signal? Can
audio provide additional information to reduce gender bias? We present the
first thorough investigation of gender bias in speech translation, contributing
with: i) the release of a benchmark useful for future studies, and ii) the
comparison of different technologies (cascade and end-to-end) on two language
directions (English-Italian/French).Comment: 9 pages of content, accepted at ACL 202
The Multilingual TEDx Corpus for Speech Recognition and Translation
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We segment transcripts into sentences and align them to the source-language audio and target-language translations. The corpus is released along with open-sourced code enabling extension to new talks and languages as they become available. Our corpus creation methodology can be applied to more languages than previous work, and creates multi-way parallel evaluation sets. We provide baselines in multiple ASR and ST settings, including multilingual models to improve translation performance for low-resource language pairs
The IWSLT 2016 Evaluation Campaign
The IWSLT 2016 Evaluation Campaign featured two tasks:
the translation of talks and the translation of video conference
conversations. While the first task extends previously
offered tasks with talks from a different source, the second
task is completely new. For both tasks, three tracks were
organised: automatic speech recognition (ASR), spoken language
translation (SLT), and machine translation (MT). Main
translation directions that were offered are English to/from
German and English to French. Additionally, the MT track
included English to/from Arabic and Czech, as well as
French to English. We received this year run submissions
from 11 research labs. All runs were evaluated with objective
metrics, while submissions for two of the MT talk tasks
were also evaluated with human post-editing. Results of the
human evaluation show improvements over the best submissions
of last year
The IWSLT 2018 Evaluation Campaign
The InternationalWorkshop of Spoken Language Translation
(IWSLT) 2018 Evaluation Campaign featured two tasks: the
low-resourced machine translation task and the speech translation
task. In the first task, manual transcribed speech needs
to be translated from Basque to English. Since this translation
direction is a under-resourced language pair, participants
were encouraged to used additional parallel data from
related languages. In the second task, the participants need
to translate English audio into German text by building a full
speech-translation system. In the baseline condition, participants
were free to used any architecture, while they are restricted
to use a single model for the end-to-end task.
This year, eight research groups took part in the Basque
English translation task, and nine in the speech translation
tas
- …